Back

BMJ Health & Care Informatics

BMJ

Preprints posted in the last 7 days, ranked by how well they match BMJ Health & Care Informatics's content profile, based on 13 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Quality and Safety profiles of AI-Generated vs Clinician-Generated Handoffs in Hospital Medicine

Shah, K. P.; Airan Javia, S.; Savage, T.; Bressman, E.

2026-06-08 health informatics 10.64898/2026.06.05.26354946 medRxiv
Top 0.1%
4.7%
Show abstract

End-of-rotation handoffs are critical for patient safety but add to documentation burden for hospitalists. Generative artificial intelligence (AI) may help automate handoff creation using electronic health record data, but its impact on quality and safety is unclear. Methods: We developed an AI handoff tool with a large language model using clinical notes as input and conducted a retrospective evaluation comparing AI-generated and clinician-authored handoffs. Handoffs were assessed across domains of quality and safety through a structured review. Results: Quality ratings were similar between AI and human handoffs (3.7 vs. 3.5, p=0.57). AI-generated handoffs were rated higher for organization (4.4 vs. 4.1, p=0.05) and completeness (4.1 vs. 3.6, p=0.01), but lower for conciseness (3.7 vs. 4.1, p=0.03) and accuracy (4.1 vs. 4.4, p=0.03). Error rates were comparable (0.3/handoff in both groups); however, AI-generated handoffs included inaccuracies (9% of AI errors) and hallucinations (1% of AI errors), while clinician-authored handoffs contained only omissions. Conclusion: Human and AI handoffs have differing error profiles and tradeoffs between completeness and conciseness. Prospective evaluation in clinical workflows is underway.

2
When Algorithms Prescribe: A Cross-Sectional Study of Quality, Misinformation, and Engagement in Statin-Related Content on TikTok

Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.

2026-06-08 health informatics 10.64898/2026.06.04.26354962 medRxiv
Top 0.2%
3.5%
Show abstract

Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.

3
Cancer care disruption during the COVID-19 pandemic in Ontario, Canada: A sequential mixed-methods study

Timilshina, N.; Jacobson, D.; Birze, A.; Wodchis, W. P.; Kuluski, K.; Strumpf, E.; Ammi, M.

2026-06-12 health systems and quality improvement 10.64898/2026.06.10.26355360 medRxiv
Top 0.2%
3.1%
Show abstract

Introduction The COVID-19 pandemic profoundly disrupted healthcare delivery worldwide, with cancer care among the most affected services. Prior studies documented delays in referrals, reduced specialist access, and increased provider burden. However, the extent to which these experiences were reflected at the system level remains unclear. Objective To document cancer care experiences and examine whether these experiences were reflected in population-level health system indicators across Ontario, Canada. Methods We used an exploratory sequential mixed-methods design. Qualitative data were collected through focus groups and semi-structured interviews with 32 participants, including patients with cancer (n=8), caregivers (n=5), healthcare providers (n=14), and decision-makers (n=5) across two hospital settings in Ontario, Canada. Emergent themes informed the development of quantitative indicators. We then conducted a retrospective population-based analysis of linked administrative health databases for cancer patients in Ontario (n=87,786) to assess the prevalence of identified themes. Results Four themes emerged: (I) delays in diagnosis and screening; (II) disrupted access to primary care; (III) barriers to specialist and mental health services; and (IV) fragmented care for patients with multimorbidity. Quantitative findings corroborated major themes. Screening rates declined for cervical (64.8% to 57.5%) and breast cancer (64.5% to 57.2%). While in-person primary care shifted almost entirely to virtual modalities (8.5% to 95.4%), overall visit volumes remained stable. Specialist care showed uneven patterns, with increased oncology visits but declines in cardiology and mental health services. Patients with multiple comorbidities experienced the largest reductions in non-oncology specialist care. Conclusion The pandemic disrupted key components of cancer care, particularly screening, access to certain specialist services, and care for patients with complex needs. Integrating qualitative and quantitative evidence highlights areas of system vulnerability and underscores the need for coordinated, resilient cancer care capable of maintaining essential services during future crises.

4
From Charting Burden to Workflow Signal: Retrospective Validation of Documentation-Density Measures for ICU Complexity and Long-Stay Risk

Collier, A.

2026-06-06 health informatics 10.64898/2026.06.04.26354922 medRxiv
Top 0.2%
2.7%
Show abstract

Background Electronic health record documentation patterns may reflect workflow complexity, monitoring intensity, and operational strain in intensive care settings. However, documentation-derived features can be sensitive to local documentation culture, data capture systems, and outcome definitions. Retrospective validation across multiple datasets is therefore needed before these signals are used in workflow intelligence or clinical AI governance tools. Objective To evaluate whether documentation-density and documentation-timing features show reproducible retrospective signal for ICU workflow complexity and long-stay proxy outcomes across de-identified critical care datasets, while distinguishing workflow and long-stay associations from unsupported claims about mortality prediction, burden reduction, or deployment readiness. Methods We synthesized retrospective validation results from de-identified ICU and workflow datasets generated through a prespecified documentation-density validation program. Feature families included Documentation Burden Score style features, Shift-End Documentation Rate style features, documentation reliability style metadata, and all-documentation feature sets where available. Outcomes included long ICU length of stay proxies, mortality where available, and workflow proxy endpoints. Models compared baseline feature sets with enhanced models containing documentation-density or workflow features. Performance was summarized using area under the receiver operating characteristic curve, Brier score where reported, delta AUROC, bootstrap confidence intervals where reported, and label-shuffle controls where available. Results The strongest external long-stay proxy evidence came from the NWICU chartevents analysis, which included 28,612 ICU stays, 20,267 stays with chart events, and 9,619,759 chart events. For ICU length of stay greater than the median, baseline AUROC was 0.5252. Enhanced AUROC was 0.9512 for Documentation Burden Score features, 0.9214 for Shift-End Documentation Rate features, 0.8470 for documentation reliability style features, and 0.9517 for all documentation features. Corresponding label-shuffle enhanced AUROCs were near random, ranging from 0.4897 to 0.5064. For ICU length of stay greater than the 75th percentile, baseline AUROC was 0.5155. Enhanced AUROC was 0.9433 for Documentation Burden Score features, 0.9194 for Shift-End Documentation Rate features, 0.8118 for documentation reliability style features, and 0.9427 for all documentation features, with label-shuffle enhanced AUROCs from 0.4836 to 0.4999. Additional retrospective support was observed in eICU workflow analyses, HiRID first-24-hour documentation-density analyses, MIMIC-IV HF ICU internal analyses, MIMIC-IV-Note metadata extensions, and nursing-chart or lab density proxy analyses. However, cross-institution discrimination transfer was weak without recalibration, and several analyses remained proxy validations rather than final clinical validations. Conclusions Documentation-density and documentation-timing features show promising retrospective signal for ICU workflow complexity and long-stay proxy outcomes, especially in NWICU chartevents and selected internal dataset-specific analyses. These findings support further preregistered, prospective, silent-mode validation of documentation-derived workflow intelligence. They do not establish prospective clinical performance, mortality reduction, clinician burden reduction, autonomous deterioration prediction, or deployment readiness.

5
Technology acceptance of machine learning in life sciences: the role of hype perception and journal impact factor.

Serrano, A. E.

2026-06-09 health informatics 10.64898/2026.06.03.26354262 medRxiv
Top 0.2%
2.7%
Show abstract

Machine learning (ML) has emerged as a transformative technology across biomedical and life science sectors, with applications spanning drug discovery, medical imaging, genomics, and clinical decision support (Goecks et al., 2020; Patel et al., 2020). Despite exponential growth in ML-related publications, from fewer than 100 articles in 2003 to nearly 25,000 by 2021 (NCBI, 2022), adoption among industry professionals remains uneven and sector-dependent. Understanding what drives or inhibits this adoption is critical for organisations seeking to leverage ML capabilities in research and clinical practice. Technology adoption in organisational contexts has been extensively studied through the Technology Acceptance Model (TAM), originally proposed by Davis (1989) and subsequently extended to incorporate external variables influencing perceived usefulness (PU) and perceived ease of use (PEU) (Venkatesh & Davis, 1996). While TAM has been applied across multiple industries, its application within biomedical and life science contexts remains limited, and the industry-specific factors that shape ML acceptance in this sector have not been systematically examined. Two external variables are particularly relevant to life science professionals. First, the bibliometric journal impact factor (JIF) functions as a cognitive signal of scientific credibility, a sector where evidence-based decision-making is culturally embedded, and publication quality serves as a proxy for technological legitimacy (Garfield, 1996). Second, technology hype, operationalised through the Gartner Hype Cycle framework, represents a social influence variable that shapes organisational expectations and investment decisions around emerging technologies (Gartner Inc., 2018). Whether these variables influence ML acceptance among life science professionals, alongside individual knowledge and experience, has not been empirically tested. This study addresses that gap by investigating ML technology acceptance among 213 biomedical and life science professionals across EMEA, LATAM, and North America, using a cross-sectional quantitative survey and PLS-SEM analysis. The TAM model is extended with three external variables, JIF, technology hype, and prior knowledge and experience, to test their influence on PU and PEU in this specific professional context. Additionally, the study examines demographic and regional differences in ML acceptance, with particular attention to variation between academic researchers and healthcare professionals. The findings contribute a validated, sector-specific extension of TAM for life sciences, provide actionable insights for organisations seeking to accelerate ML implementation, and establish a framework for future subsector-specific research.

6
A Comparison of Manual and Automated Approaches to Developing Computable Algorithms for Identifying Acute Pancreatitis

Bann, M. A.; Carrell, D. S.; Gruber, S.; Heagerty, P. J.; Williamson, B. D.; Nelson, J. C.; Hazlehurst, B.; Felcher, A.; Nyongesa, D. B.; Slaughter, M. T.; Sapp, D. S.; Cronkite, D. J.; Ball, R.; Floyd, J. S.

2026-06-08 health informatics 10.64898/2026.06.05.26354934 medRxiv
Top 0.4%
1.8%
Show abstract

Objective: Clinical phenotyping methods that rely on clinical and informatics expertise can be time-intensive and costly. We tested both manual and highly automated approaches using electronic health record (EHR) data to identify an FDA Sentinel Initiative health outcome of interest, acute pancreatitis. Materials and Methods: We trained and evaluated machine learning algorithms using EHR data with two approaches: a custom approach that included manually curated features and trained on outcomes data validated with medical record review, and a highly automated approach that greatly simplifies and automates feature engineering and relies on low-cost silver-standard outcomes for model training. Results: Custom algorithms using manually curated structured claims data discriminated cases from non-cases with a high degree of accuracy (cv-AUC 0.89 [95%CI 0.84-0.94]); the inclusion of natural language processing (NLP)-derived covariates from clinical notes increased performance slightly (cv-AUC 0.91[95%CI 0.86-0.97]). The automated algorithm trained on the outcome count of diagnosis codes performed less well (AUC 0.80 [95% CI 0.75-0.85]) but improved using maximum lipase value as an outcome (AUC 0.88 [95% CI 0.84-0.92]). At a positive predictive value of 90%, the custom algorithm had a sensitivity of 92%, the automated algorithm trained on diagnosis code count had a sensitivity of 45%, and the automated algorithm trained on maximum lipase value had a sensitivity of 84%. However, a prediction rule derived by clinicians during chart review was nearly as accurate (maximum lipase value [≥] 3 times upper limit of normal; AUC 0.86, PPV 85%, sensitivity 92%). Discussion: Machine learning algorithms with manually curated structured data and NLP features trained on validated outcomes data successfully identified validated events. Use of an outcome in the automated model based on specific phenotype knowledge (maximum lipase value) allowed for performance similar to the custom model and with considerably less resources.

7
A Data-Driven Framework for Generating Population-Linked Case Vignettes from Nationwide Triage Data

Seidel, A.; Steiger, E.; Schuster, J.; Kroll, L. E.

2026-06-10 health informatics 10.64898/2026.06.08.26354886 medRxiv
Top 0.4%
1.7%
Show abstract

Background: Digital decision-support tools such as triage systems and symptom checkers support millions of health-related decisions each year. Their quality and safety are commonly evaluated using textual patient cases, known as case vignettes. However, existing vignette sets written by medical experts cover only a limited spectrum of real-world patient presentations and lack population weights, which would allow extrapolating evaluation results to the underlying patient population. Objective: This study aims to develop a data-driven framework for automatically generating a human-manageable set of case vignettes from nationwide triage data that captures broad presentation diversity and links each vignette to a quantitative weight reflecting the number of underlying patient assessments. Methods: From 3.2 million triage assessments conducted over one year using structured triage software in the German medical on-call service (telephone triage and online self-triage) and at the joint contact points of the outpatient emergency care service and hospital emergency departments, we randomly sampled 50,000 cases. Triage questionnaires were converted into semantic embeddings using a German Sentence Transformer Model and grouped by agglomerative clustering. For clusters containing sufficient assessments, we generated one representative assessment using a two-phase simulated-annealing optimization. The optimization minimized the distance to the cluster centroid while maximizing the number of answered triage questions, aiming for high representativeness and information content. Each representative assessment was assigned the size of its source cluster as its sample-based weight. A similarity-based sensitivity analysis was performed to examine whether these weights were preserved in the full 1-year population. Finally, the question-answer pairs of the representative assessments were converted into structured textual case vignettes using controlled prompting of a large language model. Results: The cluster analysis yielded 514 included clusters covering 96.8% of the sampled 50,000 assessments. The generated representatives showed strong agreement with the majority treatment-urgency recommendation of their source cluster (Spearman's {rho}=0.78, p<0.001) and contained on average 4.3 more answered triage questions than the original assessments within their clusters. When weighted by cluster size, the representatives approximated the sample distributions of treatment urgency, demographics, and symptoms, although some systematic deviations remained, most notably an overrepresentation of female cases (+13.5%), patients aged 14-49 years (+8.0%), and the urgency category "As soon as possible" (+6.6%). Of 121 recorded symptoms, 101 (83.5%) were covered by the representatives; the rest each occurred in <0.5% of the sample. In a sensitivity analysis, cluster-based vignette weights were strongly correlated with similarity-based population weights (Spearman's {rho}=0.77, p<0.001), and 90.1% of assessments in the full 1-year population were matched to at least one vignette. Conclusions: We present a data-driven framework for deriving a manageable set of population-weighted case vignettes from nationwide triage data. The resulting vignettes captured broad presentation diversity, approximated key sample characteristics, and provided an explicit quantitative link to the number of underlying patient assessments. After medical expert review and refinement, the vignettes may support more population-aware evaluation and quality assurance of digital decision-support tools.

8
AutoClip: AI-Guided TEE Semantic Segmentation for TEER A Proof-of-Concept Study

Chen, M.; Li, X.; Yang, K.; Taramasso, M.

2026-06-06 cardiovascular medicine 10.64898/2026.05.29.26354195 medRxiv
Top 0.5%
1.6%
Show abstract

**Abstract** **Background:** Transcatheter edge-to-edge repair (TEER) is an established treatment for mitral regurgitation but remains highly dependent on operator experience and complex transesophageal echocardiography (TEE)-guided intraprocedural imaging. Artificial intelligence (AI)-based semantic segmentation may improve procedural reproducibility and intraprocedural guidance; however, no TEER-specific segmentation framework has been reported. **Objectives:** To develop and evaluate AutoClip, a clinician-driven AI-guided TEE semantic segmentation model designed for simultaneous delineation of mitral valve anatomy and in-vivo TEER device components. **Methods:** A retrospective proof-of-concept study was conducted using 987 intraprocedural TEE frames derived from 10 video clips in 3 patients undergoing MitraClip G4 implantation. Seven semantic labels, including mitral leaflets and device components, were manually annotated using ITK-SNAP. Following standardized preprocessing and region-of-interest extraction, an Attention U-Net architecture was trained frame-wise on bicommissural and corresponding X-plane TEE views. Model performance was assessed using mean intersection-over-union (IoU) and Dice coefficient on an independent test set. **Results:** The Attention U-Net demonstrated improved sensitivity to small device structures compared with conventional U-Net architectures. Preliminary training performance achieved a mean IoU of approximately 0.93, while independent test performance reached a mean IoU of 0.46 across foreground classes. Qualitative assessment demonstrated feasible simultaneous segmentation of mitral leaflets, clip arms, grippers, and delivery shaft during TEER procedures. **Conclusions:** AutoClip represents a proof-of-concept TEER-specific TEE semantic segmentation framework initiated through a clinician-oriented workflow without formal computer science expertise. Although preliminary accuracy remains modest due to limited sample size, this study establishes a reproducible pathway for future AI-assisted intraprocedural guidance systems and larger multicenter development efforts in structural heart interventions.

9
Surviving Severe Acute Brain injury: Care trajectories and missed opportunities

Bunker, A. L.; Engelberg, R. A.; Holloway, R. G.; Creutzfeldt, C. J.

2026-06-09 neurology 10.64898/2026.06.01.26354480 medRxiv
Top 0.5%
1.5%
Show abstract

INTRODUCTION Severe acute brain injury (stroke, traumatic brain injury or hypoxic-ischemic encephalopathy; SABI) is increasingly recognized as a chronic condition with care and communication needs beyond the initial hospitalization. This study aimed to characterize post-acute care patterns among SABI survivors, focusing on healthcare utilization and outpatient communication. METHODS Data were collected from a prospective cohort of hospitalized SABI patients using surveys, chart reviews, and the ED Information Exchange database. Socioeconomic disadvantage was assessed using the Area Deprivation Index (ADI), and qualitative analysis of outpatient notes examined conversations around palliative care needs and goals-of-care. RESULTS Two-thirds of patients (140/222) survived until discharge, primarily to nursing facilities (39%) or inpatient rehabilitation (38%). Among 109 with one-year follow-up, there were 89 hospitalizations, 104 ED visits, and 28 deaths. Patients from the most disadvantaged neighborhoods had significantly higher odds of rehospitalization or ED use within 30 days (OR 3.37, p=0.036). ADI was not linked to one-year utilization. seen outpatient by primary care (40%), neurology/neurosurgery (57%), and palliative care (1%), but conversations rarely revisited prognosis or goals-of-care. CONCLUSIONS Our findings highlight the need for improved long-term care planning and communication, particularly for socioeconomically disadvantaged survivors of SABI.

10
An AI-assisted feasibility evaluation of three photoplethysmography-derived microvascular reactivity signals in MIMIC-IV-WDB v0.1.0

Landry, T. C.; Kim, Y.

2026-06-06 health informatics 10.64898/2026.06.03.26354863 medRxiv
Top 0.6%
1.2%
Show abstract

Background. Capillary refill time, an examiner-dependent bedside test of distal microvascular perfusion, has become a resuscitation target in septic shock,1,2,3,4 motivating a continuous surrogate computed from the photoplethysmogram (PPG, the optical waveform the pulse oximeter on every ICU patient already records).5,6,7,8 Objective. We attempted three PPG-derived candidate measures on the MIMIC-IV Waveform Database (MIMIC-IV-WDB v0.1.0) and asked, by inspecting randomly drawn examples, whether each captured its intended physiology before any downstream modeling. Methods. MIMIC-IV-WDB v0.1.09 was linked to MIMIC-IV.10 The signals were a cuff-anchored perfusion-index recovery (reactive hyperemia when the cuff shares an arm with the probe), a slow Mayer-wave-band power ratio of the perfusion index (sympathetic vasomotor tone), and a per-beat diastolic exponential decay time constant (a refill-like recovery time). For each signal we drew 10 random examples at a fixed seed and checked them against a checklist fixed in advance. Each was read by the author and, separately, by MedGemma 1.5, a multimodal medical language model run locally. A synthetic test with a known time constant checked the third signal. Results. The cuff-anchored signal showed the expected occlusion-reperfusion shape on 268 of 6,236 evaluable cuff cycles (4.30%) in 15 of 19 patients, consistent with opposite-limb placement of the probe and cuff. The slow-band ratio returned a stable cohort value, but a clear, stationary peak appeared in only4 of 10 random windows. The per-beat fit met its goodness-of-fit threshold in 10 of 10 beats, yet a cardiac-frequency heuristic flagged a possible fit on the heart-rate oscillation in 7 of 10, and in 5 of 17 patients the time constant lay where an exponential is indistinguishable from a straight line. A 0.5Hz high-pass pre-filter implanted its own approximately 318 ms time constant regardless of truth. The language model tracked the human on clear positives but reported the pattern present on every call it returned, never absent. Conclusions. Two of the three candidate signals did not reflect their intended physiology in most examples, and the third was constrained by sensor placement. Inspecting a few random raw inputs against a checklist written in advance is an inexpensive upstream check before downstream inference on PPG-derived microvascular signals.

11
Healthy Heart Actions Right Time (HHART): Co-design priorities to connect Aboriginal and Torres Strait Islander community and clinic activities for healthy hearts

Wyber, R.; Zagler, J.; Liu, C.; Yadav, U. N.; O'Dwyer, Z.; Hart, K.; Chapman, K.; McGrady, L.; Kohn, A.; Winterfield, N.; Williams, D.; Watson, N.; Morey, K.; Pearson, O.

2026-06-10 primary care research 10.64898/2026.06.05.26354870 medRxiv
Top 0.8%
0.9%
Show abstract

Aim: Healthy Heart Actions Right Time (HHART) is a multi-phased research project that seeks to identify, implement and evaluate strategies to connect community and clinical activities to reduce the burden of heart disease for Aboriginal and Torres Strait Islander people. The aim in Phase One was to identify priority activities for two participating services. Background: The ongoing effects of colonisation drive a disproportionate burden of heart disease for Aboriginal and Torres Strait Islander people. Clinical and community groups both have established strengths in reducing the risk of heart disease, but these are not always well connected. Methods: Using a case study methodology in two locations we partnered in a 12-month co-design process to identify priority activities to connect clinical and community activities. Findings: Three priorities emerged from the Phase One co-design process: (i) community-led gardening as a strategy to promote heart health through connection and healthy lifestyles; (ii) community days to increase engagement in heart checks and strengthen community-clinic relationship; and (iii) clinic-led development of culturally relevant education resources to promote clinician confidence and community heart health knowledge.

12
Correlates of time to presentation for stroke care among patients at a tertiary hospital in Ondo State, Nigeria: A retrospective records review

Ogunsemoyin, O.; Fayehun, O.

2026-06-09 health policy 10.64898/2026.06.06.26355064 medRxiv
Top 0.8%
0.9%
Show abstract

Introduction: Early hospital presentation after stroke onset is necessary for rapid assessment and access to time-dependent acute management. This study examined the correlates of late presentation for stroke care among patients recorded at a tertiary hospital in Ondo State, Nigeria. Methods: A retrospective records review was conducted using secondary data from the Stroke Registry of the University of Medical Sciences Teaching Hospital, radiology department records, referral notes, and ambulance records. Records of stroke cases documented within the preceding 24 months were reviewed. Late presentation was defined as hospital presentation more than four hours after symptom onset. Frequencies, chi-square tests, and modified Poisson regression with robust standard errors were used to estimate adjusted prevalence ratios. Results: The analysis included 371 stroke cases. Of these, 317 (85.4%) presented after four hours, and the median time to presentation was 24 hours (interquartile range: 9-72 hours). Late presentation differed significantly by employment status, first-contact route, and pathway complexity at bivariate analysis. After adjustment, non-hospital first contact remained strongly associated with late presentation: patients whose first documented contact was non-hospital-based had almost 3 times the prevalence of delay compared with those whose first contact was hospital-based (adjusted prevalence ratio = 2.89; 95% confidence interval: 2.15-3.90; p < 0.001). Conclusion: Late presentation was pervasive in this tertiary hospital record cohort and was primarily associated with the initial direction of care-seeking. Stroke response interventions should emphasise immediate hospital presentation and strengthen urgent referral from non-hospital first-contact points.

13
Healthcare professionals' perspectives on a multilevel cardiovascular risk management intervention (PROSPERA programme)

Bongaerts, V. A. M. C.; van Gestel, L. C.; van Peet, P. G.; Vuijk, M.-L. S.; Hageman, S. H. J.; Dorresteijn, J. A. N.; Bonten, T. N.; Numans, M. E.; van Os, H. J. A.; Vos, R. C.

2026-06-09 cardiovascular medicine 10.64898/2026.06.08.26355169 medRxiv
Top 0.8%
0.8%
Show abstract

Background: Two-thirds of Dutch cardiovascular risk management (CVRM) for patients at risk of cardiovascular disease is delivered in primary care practices. While individual risk scores are increasingly used during consultation, a population-level structure for risk-based patient outreach is not currently available. We therefore developed the PROSPERA programme, a multilevel intervention comprising population-level risk stratification and individual-level support tools. Aim: To assess anticipated and experienced barriers and facilitators among healthcare professionals (HCPs) to inform implementation in primary care. Methods: We conducted four focus groups and six interviews with nine primary care HCPs to explore anticipated and experienced barriers and facilitators. Inductive codes were thematically analysed and assigned to corresponding domains of the Theoretical Domains Framework (TDF) and the related Capability, Opportunity, Motivation model of Behaviour. Results: Barriers and facilitators were identified in 11 TDF domains. Population-level barriers included altered professional roles and limitations in technological infrastructure. Individual-level barriers were limited skills in interpreting risk calculations and difficulty integrating tools into clinical routine. Facilitators were related to beliefs on the importance of providing proactive care (population level), the use of U-Prevent for risk communication (individual level) and positive patient responses to the Lifestylecheck questionnaire (individual level). Conclusion: Addressing barriers and facilitators identified at both the population and individual levels can support implementation of the PROSPERA programme. Opportunities exist in education and training of HCPs in risk communication, as well as support in restructuring the physical and digital environment.

14
Positioning Early Phase CNS Trials for Regulatory and Investor Success: Strategic Implications of the Single Phase 3 Approval Paradigm

Schmidt, P.; Preskorn, S.

2026-06-08 neurology 10.64898/2026.06.05.26353604 medRxiv
Top 0.9%
0.8%
Show abstract

In February 2026, the FDA announced that a single pivotal phase 3 (P3) trial would become the new default standard for drug approval - a regulatory direction that had been legally enabled since the FDA Modernization Act of 1997. This announcement has strategic, scientific, and economic implications for drug developers, contract research organizations (CROs), and biotech investors. We argue that the expansion of this framework, originally reserved for various niche submissions, represents a paradigm change, dramatically increasing the value of rigorous early phase (P1 and P2) trial design, requiring sponsors to establish both statistical efficacy signals and mechanistic biological understanding before entering phase 3. Using a CNS indication cost model, we show that single P3 approval can reduce total development expenditure from approximately $447 million over 14 years to $297 million over 12 years - a savings of $150 million and providing two years of additional commercial runway for a modeled CNS drug. Case examples including lecanemab, omaveloxolone, and tofersen illustrate how biomarker-informed early phase strategies can establish the confirmatory evidence necessary for single-trial approval. We provide practical guidance for maximizing the value of P1 and P2 under this evolving framework.

15
A hierarchical clinical fusion transformer model for personalized opioid treatment: Development and validation in diabetic surgical patients

Naderalvojoud, B.; Sutjiadi, B. J.; Koul, A.; Curtin, C.; Gevaert, O.; Hernandez-Boussard, T.

2026-06-08 health informatics 10.64898/2026.06.04.26353331 medRxiv
Top 0.9%
0.7%
Show abstract

Background Machine learning (ML) models are increasingly used to predict adverse outcomes after surgery. However, most rely on static patient characteristics (e.g., age, comorbidities) and overlook clinician-controlled treatment decisions that can be actively modified at the point of care. Discharge opioid prescribing is a key modifiable, clinician-controlled decision, yet optimizing prescribing choices across multiple adverse outcomes remains underexplored in predictive modeling. This study addresses that gap by introducing a novel ML framework that explicitly separates fixed patient risk factors from modifiable prescribing options to support personalized, risk-informed opioid prescribing decisions. Methods We developed the Hierarchical Clinical Fusion Transformer (HCF-Transformer), an ML model designed to estimate patient-specific risks across four postoperative outcomes: prolonged opioid use (POU), chronic pain (CP), 30-day readmission, and opioid-associated outcomes (OAO). The model constructs patient risk profiles from fixed, non-modifiable baseline factors, followed by a transformer layer. Clinician-controllable discharge opioid regimens are modeled as alternative intervention candidates and fused with the fixed risk representation through a clinical fusion mechanism, enabling assessment and ranking based on predicted risks. A Total Relative Risk (TRR) metric, calibrated to each outcome prediction threshold, guides the recommendation process. We evaluated the model in diabetic surgical patients, a common high-risk population. Results The study included 157,853 unique diabetic surgical patients, with outcome prevalences ranging from 47.2% (POU) to 1.8% (OAO). The HCF-Transformer achieved the highest AUROCs, 0.798 for POU, 0.712 for 30-day readmission, 0.808 for CP, and 0.922 for OAO, outperforming Random Forest, FT-Transformer, and ResNet-based models. Compared to these baselines, HCF-Transformer generated more stable and discriminative risk estimates and demonstrated significant variation in TRR scores across discharge opioid options (ANOVA p < .01, eta-squared > .01). This enabled consistent identification of lower-risk regimens tailored to patient-specific profiles. Conclusions The HCF-Transformer introduces a novel hierarchical fusion approach to optimize opioid prescribing by integrating static patient risk profiles with modifiable discharge options. Using transformer-based modeling and a quantifiable TRR metric, the model delivers personalized, risk-aware recommendations. This approach enables data-driven opioid prescribing tailored to individual risk and has the potential to improve postoperative outcomes in high-risk populations. Our findings demonstrate that integrating modifiable factors with structured risk profiles through a transformer-based fusion architecture can enhance decision-support systems, paving the way for more actionable and personalized AI in healthcare.

16
Care-seeking pathways and time to tertiary hospital presentation for stroke care in Ondo State, Nigeria

Ogunsemoyin, O.; Fayehun, O.

2026-06-08 health systems and quality improvement 10.64898/2026.06.04.26354906 medRxiv
Top 1%
0.5%
Show abstract

Introduction: Stroke care is time-sensitive, yet patients in low-resource settings may reach tertiary services only after passing through multiple formal and informal care options. This study examined documented care-seeking pathways and time to presentation among stroke cases recorded at the University of Medical Sciences Teaching Hospital (UNIMEDTH), Ondo State, Nigeria. Methods: A retrospective hospital record review was conducted using secondary data from the Stroke Registry, radiology department records, referral notes, and ambulance records at UNIMEDTH. The analysis included 371 stroke cases with documented time from symptom onset to UNIMEDTH presentation and reconstructable care pathways. First-contact routes were classified as hospital/biomedical, self/informal or traditional/faith-based care, and the number of documented steps defined pathway complexity before and including tertiary presentation. Frequencies and percentages described pathway patterns; median presentation times were compared using Mann-Whitney U and Kruskal-Wallis tests. Results: The median time to tertiary presentation was 24 hours (interquartile range [IQR] 9-72), and 317 patients (85.4%) presented after four hours. Only 30 patients (8.1%) presented directly to UNIMEDTH; 44 distinct care-pathway sequences were recorded. Hospital-facility first contact was documented for 81 patients (21.8%). It was associated with a median presentation time of 3 hours (IQR 2-6), compared with 48 hours (IQR 24-72) among patients whose initial contact was outside a hospital facility (U = 699.50, p < 0.001). The median time also differed across grouped first-contact categories and pathway complexity levels (both p < 0.001). Conclusion: Non-hospital or multi-step care-seeking pathways commonly preceded tertiary stroke presentations in this setting. The findings indicate that delayed tertiary arrival is partly embedded in the pathway followed after symptom onset. Interventions should combine public recognition of stroke warning signs with urgent referral linkages involving hospitals, patent medicine vendors, traditional and faith-based providers, and emergency transport systems.

17
Characterizing Documented Psychosocial Stressors in Pediatric Psychiatric Emergencies with an Open-Weight Large Language Model

Hartlage, C. S.; Manning, E. R.; Bernard, J.; Vaish, S.; Gray, J.; Young, M.; Pestian, T.; Folger, A. T.; Tachinardi, P.; Mendonca, E. A.; Brokamp, C.

2026-06-09 health informatics 10.64898/2026.06.08.26354931 medRxiv
Top 1%
0.5%
Show abstract

Objective: To evaluate whether a locally hosted open-weight large language model (LLM) can extract documented psychosocial factors from pediatric psychiatric intake notes and apply validated extraction to a large emergency psychiatry cohort. Materials and Methods: We identified emergency department presentations at Cincinnati Children's Hospital Medical Center from January 1, 2016, through December 31, 2024, among patients younger than 18 years with psychiatric billing diagnoses. Using full-text intake notes, gpt-oss:120b classified peer conflict, sleep disruption, and school-related academic, attendance, and disciplinary issues as detected, negated, or indeterminate. Four human raters independently reviewed 50 notes. We compared Fleiss' kappa among humans alone versus humans plus the LLM, assessed repeated-query stability across 50 independent calls per note, and applied the workflow to all eligible notes. Results: Among 37,315 eligible admissions, 22,284 had eligible intake notes; 22,270 produced parseable JSON. In detected-versus-not-detected coding, human-plus-LLM reliability did not differ significantly from human-only reliability across measures (human {kappa} 0.71-0.94; human-plus-LLM {kappa} 0.70-0.93). Stability was associated with human agreement: mean LLM-human agreement increased from 42.6% for classifications with less than 80% stability to 82.7% for classifications with 100% stability (Pearson r = 0.36). Full-cohort extraction showed frequent and overlapping documented factors: sleep disruption was most frequently detected (57.7%), followed by peer conflict (47.2%), academic issues (43.4%), disciplinary issues (43.3%), and attendance issues (16.9%). Discussion: Agreement varied by construct and was strongest when repeated model outputs were stable. Conclusion: Locally hosted open-weight LLMs can support scalable structured extraction of documented psychosocial factors from pediatric psychiatric intake notes after local validation.

18
A Three-Tier Operational Benchmark for Evaluating Large Language Models on Hospital Medication Safety

Proulx, J.; Daines, B.; Barton, M.; Leonard, M. E.; Garcia, J. A.; Young, B.; Snell, Q.; West, T. W.; Watson, S. R.; AlQaseer, M.; Louiset, M.; Maqsood, M. B.; Voutt-Goos, M. J.; Douma, C.; Kasbekar, N.; Jeffries, J.; Abu-Rahmeh, W.; Frush, K.; Grewal, D. K.; Bahsoun, M.; Leonard, M.; Frankel, A.; Classen, D. C.; Pestotnik, S. L.

2026-06-10 health informatics 10.64898/2026.06.05.26354271 medRxiv
Top 1%
0.5%
Show abstract

Objective. To introduce PsiBench, a clinically validated medication-safety benchmark for evaluating large language models (LLMs) against the standards used to certify hospital computerized provider order entry (CPOE) and electronic health record (EHR) systems, and a non-overlapping three-tier evaluation framework separating highest-stakes discrimination, the operational CDS regime, and category-correct alerting. Materials and Methods. PsiBench comprises 492 medication-safety scenarios across 11 safety categories, created by clinical pharmacology experts whose work underpins an annualized testing procedure used by more than 2,000 U.S. hospitals. The three-tier framework partitions the scenarios non-overlappingly: Discrimination (98 scenarios, 50 fatal vs 48 deception, near-balanced 51%/49%); Operational (394 scenarios, 261 serious unsafe plus 133 safe including 41 Excessive Alerts reclassified as operational negatives); and Attribution (311 alert-required scenarios). We evaluated 40 frontier LLMs from 10 providers over 3 runs per scenario at temperature 0.2 (or the provider default where temperature is not configurable), yielding 59,040 evaluations conducted April 21-23, 2026. Results. Headline binary performance on the full benchmark spans a wide range across the 40 models: F1 78.5%-92.3%, accuracy 65.4%-89.8%, sensitivity 81.4%-100.0%, specificity 6.1%-81.8%. Leading models by F1 (o4-mini 92.3%; o3 92.2%) pair high sensitivity with meaningful specificity; three models saturate sensitivity at 100% but fall below 25% specificity, indistinguishable from a naive always-alert classifier. The wide spread on a single headline metric motivates tier-specific analyses, developed in a separate clinical paper. Discussion and Conclusion. PsiBench and the three-tier framework operationalize a rigorous evaluation rubric for LLM medication safety, grounded in two decades of national hospital audit experience. The framework generalizes to any binary medication-safety classifier (rule-based, conventional ML, or LLM-driven), supporting tier-aware model selection and post-deployment surveillance.

19
A global cross-sectional survey of health professionals' interest-confidence gaps in value-based health care implementation: a learning needs assessment

Lewis, S.; Andrews, A.; Laing, H.

2026-06-11 medical education 10.64898/2026.06.10.26355253 medRxiv
Top 1%
0.5%
Show abstract

Abstract Objectives Value-Based Health Care (VBHC) increasingly guides health system redesign internationally. Despite the increasing availability of VBHC education, gaps remain between health professionals' conceptual understanding of VBHC and their confidence to implement it in practice. This study assessed perceived learning needs and preferences of healthcare professionals across foundational topics essential to VBHC implementation. Design Cross-sectional online survey study Setting and participants The survey was distributed to the global VBHC community and yielded 518 responses. Most respondents were based in the UK and Ireland (51%) and 65% had more than 10 years of experience in the health sector. Participants represented a variety of professional backgrounds, including clinicians (34%), operational or executive managers and leaders (22%), and life sciences or procurement professionals (13%). Primary and secondary outcome measures Primary outcome measures included self-reported interest and confidence across 15 VBHC domains and the magnitude of the gap between them. Secondary outcomes included perceived implementation challenges and preferred VBHC learning approaches, including prior engagement with VBHC-related learning. Results Respondents identified substantial VBHC implementation challenges, including implementing outcome measurement (62.4%), conflicting priorities (57.7%), and resistance to change (56.8%). Interest in all VBHC domains was high (median >= 80/10), while confidence to implement remained substantially lower across most domains (median <=50/100). The largest interest-confidence gaps were observed for reimbursement mechanisms, costing methodology, and overcoming implementation challenges. Interactive learning approaches, including in-person seminars/workshops (55.2%) and online masterclasses (53.9%) were preferred over self-directed formats. Conclusions This international survey identified consistent gaps between health professionals' interest in VBHC and their confidence to implement key VBHC domains in practice. Addressing these gaps through advanced, targeted and contextual education may support more effective and sustainable VBHC implementation in practice.

20
A Heterogeneous Graph Neural Network Framework for Multi-Horizon Stroke Mortality Prediction

Tharzeen, A.; Vafaei Sadr, A.; Radfar, N.; Hwang, W.; Abedi, V.; Zand, R.

2026-06-10 health informatics 10.64898/2026.06.09.26355176 medRxiv
Top 1%
0.4%
Show abstract

Background: Machine learning models for stroke mortality prediction typically treat each time horizon independently and use flat tabular features that ignore the relational structure of electronic health records (EHRs). In this pilot study, we leveraged graph-based machine learning models to predict post stroke all-cause-mortality across three different time horizons. Methods: We developed Stroke Temporal Heterogeneous Graph (StrokeTHG), a heterogeneous graph neural network model for simultaneous multi-horizon stroke mortality prediction (30-day, 90-day, 1-year) using EHR data from Penn State Health System. The model encodes various relations among EHR entities (e.g., patient, diagnosis, comorbidity) and temporal encoding of admission time to better predict stroke mortality. We compared our proposed approach against various baseline methods, including Logistic Regression, Random Forest, and XGBoost. We also performed ablation and subgroup analyses, evaluated the quality of learned graph embeddings, and assessed the importance of different edge types in the graph. Results: We included 4,144 stroke patients (mean age 69.2 years; 54.3% men), of whom 3,332 (80.4%) survived their stroke after one year. 30-day, 90-day, and 1-year mortality rates were 9.7%, 13.7%, and 19.6%, respectively. Our proposed approach, StrokeTHG, achieved AUROC of 0.872, 0.878, and 0.837 across horizons, outperforming all tabular baselines. At [&ge;] , 75% specificity, the model identified 5-10 percentage points more mortality cases than the best baseline at each horizon. Subgroup analysis demonstrated consistent performance across sex subgroups and the largest discriminative gains in the Age 65-80 stratum. Edge-type ablation identified phenotype-patient and admission-patient edges in the constructed EHR graph as the most influential relational edges for mortality prediction. StrokeTHG embeddings outperformed all graph and matrix factorization baselines under an identical downstream classifier, confirming that performance gains stem from representation quality rather than classifier capacity. Conclusions: StrokeTHG demonstrates that heterogeneous graph representations of EHR data provide a consistent improvement over flat tabular models for multi-horizon stroke mortality prediction, with particular advantage at clinically actionable sensitivity thresholds and novel multi-horizon monotonic prediction capability. This methodological framework may be adaptable to other EHR-based clinical research studies seeking to leverage heterogeneous relational structures for predictive modeling.